Automated Exploratory Data Analysis - AutoViz </FONT> </CENTER>
AutoViz is a Python library designed to automate the process of data visualization, making it easier for data scientists and analysts to quickly explore and visualize their datasets. It aims to provide an automatic, one-line solution to generate a comprehensive set of visualizations for data analysis without requiring extensive coding or configuration.
The key features and benefits of AutoViz are as follows:
Automated Visualization Generation: AutoViz automatically generates a wide variety of visualizations for each feature in the dataset, allowing users to quickly understand the distribution, patterns, and relationships within the data. Ease of Use: The primary advantage of AutoViz is its simplicity. With just one line of code, you can create a comprehensive set of visualizations without the need for manual customization. Support for Large Datasets: AutoViz is designed to handle large datasets efficiently, enabling users to visualize datasets with a large number of features and data points. Wide Range of Visualizations: AutoViz supports various types of visualizations, including histograms, scatter plots, box plots, line plots, bar plots, and more. This helps users explore different aspects of the data quickly. Handling of Numeric and Categorical Data: AutoViz can handle both numeric and categorical features, providing visualizations appropriate for each data type. Handling of Missing Values: AutoViz can automatically handle missing values in the dataset and visualize their distribution and impact on other features. Interactive Plots: Some of the visualizations generated by AutoViz are interactive, allowing users to explore the data further and zoom in on specific areas of interest. Comparative Visualization: AutoViz includes visualizations that allow users to compare different features and analyze their relationships easily.
</B>
(1) Importing the Data
</FONT>
# Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# Displaying all the columns of the Dataframe
pd.pandas.set_option('display.max_columns', None)
# Displaying all the rows of the Dataframe
# pd.pandas.set_option('display.max_rows', None)
dataset = pd.read_csv("/Users/lokaraju/JOBS/Loan Project /LoanDataset.csv")
!pip -q install autoviz
from autoviz.AutoViz_Class import AutoViz_Class
Imported v0.1.730. After importing autoviz, execute '%matplotlib inline' to display charts inline.
AV = AutoViz_Class()
dfte = AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=1, lowess=False,
chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30, save_plot_dir=None)
AV = AutoViz_Class()
%matplotlib inline
viz = AV.AutoViz("/Users/lokaraju/JOBS/Loan Project /LoanDataset.csv", sep = ",",
verbose = 0,
header = 0,
chart_format = "svg",
dfte = None,
depVar = "",
lowess = False,
)
Shape of your Data Set loaded: (148670, 32)
#######################################################################################
######################## C L A S S I F Y I N G V A R I A B L E S ####################
#######################################################################################
Classifying variables in data set...
Number of Numeric Columns = 7
Number of Integer-Categorical Columns = 2
Number of String-Categorical Columns = 8
Number of Factor-Categorical Columns = 0
Number of String-Boolean Columns = 13
Number of Numeric-Boolean Columns = 1
Number of Discrete String Columns = 0
Number of NLP String Columns = 0
Number of Date Time Columns = 0
Number of ID Columns = 1
Number of Columns to Delete = 0
32 Predictors classified...
1 variable(s) removed since they were ID or low-information variables
7 numeric variables in data exceeds limit, taking top 30 variables
Number of All Scatter Plots = 28
All Plots done Time to run AutoViz = 27 seconds ###################### AUTO VISUALIZATION Completed ########################